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MEDIAN FILTER COMBINATIONS FOR VIDEO NOISE REDUCTION 

CROSS-REFERENCE TO RELATED APPLICATIONS 
[0001] This application is a divisional of and claims priority to U.S. Application 

Serial No. 09/545,233 filed on April 7, 2000 (allowed) (which is incorporated herein in its 

entirety), which was a continuation-in-part application of U.S. Application Serial No. 

09/442,595 filed on November 17, 1999, which was a continuation of U.S. AppUcation Serial 

No. 09/217,151 filed on December 21, 1998 (now U.S. Patent No. 5,988,863, issued 

November 23, 1999), which was a continuation of U.S. AppUcation Serial No. 08/594,815 

filed January 30, 1996 (now U.S. Patent No. 5,852,565, issued December 22, 1998). 

TECHNICAL FIELD 
[0002] This invention relates to electronic communication systems, and more 

particularly to an advanced electronic television system having enhanced compression, 

filtering, and display characteristics. 

BACKGROUND 

[0003] The United States presently uses the NTSC standard for television 
transmissions. However, proposals have been made to replace the NTSC standard with an 
Advanced Television standard. For example, it has been proposed that the U.S. adopt digital 
standard-definition and advanced television formats at rates of 24 Hz, 30 Hz, 60 Hz, and 
60 Hz interlaced. It is apparent that these rates are intended to continue (and thus be 
compatible with) the existing NTSC television display rate of 60 Hz (or 59.94 Hz). It is also 
apparent that "3-2 pulldown" is intended for display on 60 Hz displays when presenting 
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movies, which have a temporal rate of 24 frames per second (fps). However, while the above 
proposal provides a menu of possible formats from which to select, each format only encodes 
and decodes a single resolution and frame rate. Because the display or motion rates of these 
formats are not integrally related to each other, conversion from one to another is difficuh. 
[0004] Further, this proposal does not provide a crucial capability of compatibility 
with computer displays. These proposed image motion rates are based upon historical rates 
which date back to the early part of this century. If a "clean-slate" were to be made, it is 
unlikely that these rates would be chosen. In the computer industry, where displays could 
utilize any rate over the last decade, rates in the 70 to 80 Hz range have proven optimal, with 
72 and 75 Hz being the most conmion rates. Unfortunately, the proposed rates of 30 and 
60 Hz lack useftil interoperabiUty with 72 or 75 Hz, resultmg in degraded temporal 
performance. 

[0005] In addition, it is being suggested by some that interlace is required, due to a 
claimed need to have about 1000 lines of resolution at high frame rates, but based upon the 
notion that such images cannot be compressed within the available 18-19 mbits/second of a 
conventional 6 MHz broadcast television channel. 

[0006] It would be much more desirable if a single signal format were to be adopted, 
containing within it all of the desired standard and high definition resolutions. However, to 
do so within the bandwidth constraints of a conventional 6 MHz broadcast television channel 
requires compression and "scalability" of both frame rate (temporal) and resolution (spatial). 
One method specifically intended to provide for such scalabiUty is the MPEG-2 standard. 
Unfortunately, the temporal and spatial scalability features specified within the MPEG-2 
standard (and newer standards, like MPEG-4) are not sufficiently efficient to accommodate 
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the needs of advanced television for the U.S. Thus, the proposal for advanced television for 
the U.S. is based upon the premise that temporal (frame rate) and spatial (resolution) layering 
are inefficient, and therefore discrete formats are necessary. 

[0007] Further, it would be desirable to provide enhancements to resolution, image 
clarity, coding efficiency, and video production efficiency. The present invention provides 
such enhancements. 

SUMMARY 

[0008] The invention provides a number of enhancements to handle a variety of video 
quality and compression problems. The following describes a number of such enhancements, 
most of which are preferably embodied as a set of tools which can be apphed to the tasks of 
enhancing images and compressing such images. The tools can be combined by a content 
developer in various ways, as desired, to optimize the visual quality and compression 
efficiency of a compressed data stream, particularly a layered compressed data stream. 

[0009] Such tools mclude improved de-interlacing and noise reduction enhancements, 
including motion analysis. 

[0010] The details of one or more embodiments of the invention are set forth in the 
accompanying drawings and the description below. Other features, objects, and advantages of 
the invention will be apparent from the description and drawings, and from the claims. 

DESCRIPTION OF DRAWINGS 
[0011] FIG. 1 A is a block diagram of an odd-field de-interlacer. 

[0012] FIG. IB is a block diagram of an even-field de-interlacer. 
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[0013] FIG. 2 is a block diagram of a frame de-interlacer using three de-interlaced 
fields. 

[0014] FIG. 3 is a block diagram of a threshold test 

[0015] FIG. 4 is a block diagram of a preferred combination of median filters 

[0016] Like reference symbols in the various drawings indicate like elements. 

DETAILED DESCRIPTION 
[0017] Throughout this description, the preferred embodiment and examples shown 

should be considered as exemplars, rather than as limitations on the invention. 

[0018] A number of enhancements may be made to handle a variety of video quaUty 

and compression problems. The following describes a number of such enhancements, most of 

which are preferably embodied as a set of tools which can be applied to the tasks of 

enhancing images and compressing such images. The tools can be combined by a content 

developer in various ways, as desired, to optimize the visual quality and compression 

efficiency of a compressed data stream, particularly a layered compressed data stream. 



De-Interlacing and Noise Reduction Enhancements 

Overview 

[0019] Experimentation has shown that many de-interlacing algorithms and devices 
depend upon the human eye to integrate fields to create an acceptable result. However, since 
compression algorithms are not a human eye, any integration of de-interlaced fields should 
take into account the characteristics of such algorithms. Without such carefiil de-interlaced 
integration, the compression process will create high levels of noise artifacts, both wasting 
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bits (hindering compression) as well as making the image look noisy and busy with artifacts. 
This distinction between de-interlacing for viewing (such as with line-doublers and line- 
quadruplers) vs. de-interlacing as input to compression, has lead to the techniques described 
below. In particular, the de-interlacing techniques described below are useful as input to 
single-layer non-interlaced MPEG-like, as well as to the layered MPEG-like compression. 
[0020] Further, noise reduction must similarly match the needs of being an input to 
compression algorithms, rather than just reducing noise appearance. The goal is generally to 
reproduce, upon decompression, no more noise than the original camera or film-grain noise. 
Equal noise is generally considered acceptable, after compression/decompression. Reduced 
noise, with equivalent sharpness and clarity with the original, is a bonus. The noise reduction 
described below achieves these goals. 

[0021] Further, for very noisy shots, such as from high speed film or with high 
camera sensitivity settings, usually in low light, noise reduction can be the difference between 
a good looking compressed/decompressed image vs. one which is imwatchably noisy. The 
compression process greatly amplifies noise which is above some threshold of acceptability 
to the compressor. Thus, the use of noise-reduction pre-processing to keep noise below this 
threshold maybe required for acceptable good quality results. 

De-Graining and Noise-Reducing Filters 

[0022] It has been found through experimentation that applying de-graining and/or 
noise-reducing filtering before layered or non-layered encoding improves the ability of the 
compression system to perform. While de-graining or noise-reduction is most effective on 
grainy or noisy images prior to compression, either process may be helpful when used in 
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moderation even on relatively low noise or low grain pictures. Any of several known de- 
graining or noise-reduction algorithms may be applied. Examples are "coring", simple 
neighbor median filters, and softening filters. 

[0023] Whether noise-reduction is needed is determined by how noisy the original 
images are. For interlaced original images, the interlace itself is a form of noise, which 
usually will require additional noise reduction filtering, in addition to the complex de- 
interlacing process described below. For progressive scan (non-interlaced) camera or film 
images, noise processing is usefiil in layered and non-layered compression when noise is 
present above a certain level. 

[0024] There are different types of noise. For example, video transfers fi-om film 
include fihn grain noise. Film grain noise is caused by silver grains which couple to yellow, 
cyan, and magenta film dyes. Yellow affects both red and green, cyan affects both blue and 
green, and magenta affects both red and blue. Red is formed where yellow and magenta dye 
crystals overlap. Similarly green is the overlap of yellow and cyan, and blue is the overlap of 
magenta and cyan. Thus, noise between colors is partially correlated through the dyes and 
grains between pairs of colors. Further, when multiple grains overlap in all three colors, as 
they do in a print dark regions of the image or on a negative in light regions of the image 
(dark on the negative), additional color combinations occur. This correlation between the 
colors can be utilized in film-grain noise reduction, but is a complex process. Further, many 
different fihn types are used, and each type has different grain sizes, shapes, and statistical 
distributions. 

[0025] For video images created by CCD-sensor and other (e.g. , tube) sensor 
cameras, the red, green, and blue noise is uncorrelated. In this case, it is best to process the 
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red, green, and blue records independently. Thus, red noise is reduced with self-red 
processing independently of green noise and blue noise; the same approach applies to green 
and blue noise. 

[0026] Thus, noise processing is best matched to the characteristics of the noise 
source itself. In the case of a composite image (from multiple sources), the noise may differ 
in characteristics over different portions of the image. In this situation, generic noise 
processing may be the only option, if noise processing is needed. 

[0027] It has also been found useful in some cases to perform a "re-graining" or "re- 
noising" process after decoding a compressed layered data stream, as a creative effect, since 
some de-grained or de-noised images may be "too clean" or "too sterile" in appearance. Re- 
graining and/or re-noising are relatively easy effects to add in the decoder using any of 
several known algorithms. For example, this can be accomplished by the addition of low pass 
filtered random noise of suitable amplitude. 

De-Interlacing Before Compression 

[0028] As mentioned above, the preferred compression method for interlaced source 
which is ultimately intended for non-interlaced display includes a step to de-interlace the 
interlaced source before the compression steps. De-interlacing a signal after decoding in the 
receiver, where the signal has been compressed in the interlaced mode, is both more costly 
and less efficient than de-interlacing prior to compression, and then sending a non-interlaced 
compressed signal. The non-interlaced compressed signal can be either layered or non- 
layered (z.e., a conventional single layer compression). 



7 



Attorney Docket No.: 07314-005002 

[0029] Experimentation has shown that filtering a single field of an interfaced source, 
and using that field as if it were a non-interlaced fiiU fi-ame, gives poor and noisy 
compression results. Thus, using a single-field de-interlacer prior to compression is not a 
good approach. Instead, experimentation has shown that a three-field-fi-ame de-interlacer 
process using field synthesized frames ("field-fi-ames"), with weights of [0.25, 0.5, 0.25] for 
the previous, current, and next field-firames, respectively, provides a good input for 
compression. Combining three field-fi-ames may be performed using other weights (although 
these weights are optimal) to create a de-interlaced input to a compression process. 

[0030] In the preferred de-interlacing system, a field-de-interlacer is used as the first 
step in the overall process to create field-fi-ames. In particular, each field is de-interlaced, 
creating a synthesized fi-ame where the total number of lines in the fi-ame is derived fi-om the 
half number of lines in a field. Thus, for example, an interlaced 1080 line image will have 
540 lines per even and odd field, each field representing l/60th of a second. Normally, the 
even and odd fields of 540 lines will be interlaced to create 1080 lines for each fi-ame, which 
represents l/30th of a second. However, in the preferred embodiment, the de-interlacer copies 
each scanline without modification fi"om a specified field (e.g., the odd fields) to a buffer that 
will hold some of the de-interlaced result. The remaining intermediate scanlines (in this 
example, the even scanlines) for the fi'ame are synthesized by adding half of the field line 
above and half of the field line below each newly stored hne. For example, the pixel values of 
line 2 for a fi-ame would each comprise 1/2 of the summed corresponding pixel values fi-om 
each of line 1 and line 3. The generation of intermediate synthesized scanhnes may be done 
on the fly, or may be computed after all of the scanlines fi"om a field are stored in a buffer. 
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The same process is repeated for the next field, although the field types (i.e., even, odd) will 
be reversed. 

[0031] FIG. 1 A is a block diagram of an odd-field de-interlacer, showing that the odd 
lines fi-om an odd field 10 are simply copied to a de-interlaced odd field 12, while the even 
lines are created by averaging adjacent odd lines from the original odd field together to form 
the even lines of the de-interlaced odd field 12. Similarly, FIG. IB is a block diagram of an 
even-field de-interlacer, showing that the even lines from an even field 14 are simply copied 
to a de-interlaced even field 16, while the odd lines are created by averaging adjacent even 
lines firom the original even field together to form the odd lines of the de-interlaced even field 
16. Note that this case corresponds to "top field first"; "bottom field first" could also be 
considered the "even" field. 

[ 0032 ] As a next step, a sequence of these de-interlaced fields is then used as input to 
a three-field-fi-ame de-interlacer to create a final de-interlaced fi-ame. FIG. 2 is a block 
diagram showing how the pixels of each output fi-ame are composed of 25% of the 
corresponding pixels fi-om a previous de-interlaced field (field-frame) 22, 50% of the 
corresponding pixels from a current field-frame 24, and 25% of the corresponding pixels 
from the next field-frame 26. 

[0033] The new de-interlaced frame then contains much fewer interlace difference 
artifacts between frames than do the three field-frames of which it is composed. However, 
there is a temporal smearing by adding the previous field-frame and next field-frame into a 
current field-frame. This temporal smearing is usually not objectionable, especially in light of 
the de-interlacing improvements which result. 
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[0034] This de-interlacing process is very beneficial as input to compression, either 
single layer (unlayered) or layered. It is also beneficial just as a treatment for interlaced video 
for presentation, viewing, or making still firames, independent of use with compression. The 
picture fi-om the de-interlacing process appears "clearer" than the presentation of the interlace 
directly, or of the de-interlaced fields. 

De-Interlace Thresholding 

[0035] Although the de-interlace three-field sum weightings of [0.25, 0.5, 0.25] 
discussed above provide a stable image, moving parts of a scene can sometimes become soft 
or can exhibit aUasing artifacts. To counteract this, a threshold test may be appHed which 
compares the result of the [0.25, 0.5, 0.25] temporal filter against the corresponding pixel 
values of only the middle field-fi-ame. If a middle field-fi-ame pixel value differs more than a 
specified threshold amount fi-om the value of the corresponding pixel firom the three-field- 
firame temporal filter, then only the middle field-fi-ame pixel value is used. In this way, a 
pixel fi-om the three-field-fi-ame temporal fiher is selected where it differs less than the 
threshold amount from the corresponding pixel of the single de-interlaced middle field-frame, 
and the middle field-frame pixel value is used when there is more difference than the 
threshold. This allows fast motion to be tracked at the field rate, and smoother parts of the 
image to be filtered and smoothed by the three-field-frame temporal filter. This combination 
has proven an effective, if not optimal, input to compression. It is also very effective for 
processing for direct viewing to de-interlace image material (also called line doubling in 
conjimction with display). 
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[0036] The preferred embodiment for such threshold determinations uses the 
following equations for corresponding RGB color values from the middle (single) de- 
interlaced field-frame image and the three-field-frame de-interlaced image: 
[0037] Rdiff = R_single field de-interlaced minus R_three_field_de-interlaced 
[0038] Gdiff = G_single_field_de-interlaced minus G_three_field_de-interlaced 
[0039] Bdiff = B_single_field_de-interlaced minus B_three_field_de-interlaced 
[0040] ThresholdingValue = abs(Rdiff+Gdiff+Bdiff) + abs(Rdiff) + abs(Gdiff)+ 
abs(Bdiff) 

[0041] The ThresholdingValue is then compared to a threshold setting. Typical 
threshold settings are in the range of 0.1 to 0.3, with 0.2 being most common. FIG. 3 shows a 
block diagram of this threshold test. The PROCESSING block 30 multipUes the inputs by 
[0.25, 0.5, 0.25] and sums the results. The SELECTION CONTROL block 32 compares the 
output 36 of the PROCESSING block 30 with Input B 34 using the above equations for 
Rdiff, Gdiff, Bdiff, and ThresholdingValue. The switch selects the PROCESSING output 36 
if the ThresholdingValue is less than the threshold, otherwise the switch selects Input B 34, 
the middle value, for the output 38. 

[0042] In order to remove noise from this threshold, smooth-filtering the three-field- 
frame and single-field-frame de-interlaced pictures can be used before comparing and 
thresholding them. This smooth filtering can be accomplished simply by down filtering (e.g., 
down filtering by two), and then up filtering (e.g., using a gaussian up-filter by two). This 
"down-up" smoothed filter can be applied to both the single-field-frame de-interlaced picture 
and the three-field-frame de-interlaced picture. The smoothed single-field-frame and three- 
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field-frame pictures can then be compared to compute a ThresholdingValue and then 
thresholded to determine which picture will source each final output pixel. 
[0043] In particular, the threshold test is used as a switch to select between the single- 
field-frame de-interlaced picture and the three-field-frame temporal filter combination of 
single-field-frame de-interlaced pictures. This selection then results in an image where the 
pixels are from the three-field-frame de-interlacer in those areas where that image differs in 
small amounts (z.e., below the threshold) from the single field-frame image, and where the 
pixels are from the single field-frame image in those areas where the three-field-frame 
differed more than then the threshold amount from the single-field-frame de-interlaced pixels 
(after smoothing). 

[0044] This technique has proven effective in preserving single-field fast motion 
details (by switching to the single-field-frame de-interlaced pixels), while smoothing large 
portions of the image (by switching to the three-field-frame de-interlaced temporal filter 
combination). 

[0045] In addition to selecting between the single-field-frame and three-field-frame 
de-interlaced image, it is also often beneficial to add a bit of the single-field-frame image to 
the three-field-frame de-interlaced picture, to preserve some of the immediacy of the single 
field pictures over the entire image. This inmiediacy is balanced against the temporal 
smoothness of the three-field-frame filter. A typical blending is to create new frame by 
adding 33.33% (1/3) of a single middle field-frame to 66.67% (2/3) of the corresponding 
three-field-frame smoothed image. This can be done before or after threshold switching, since 
the result is the same either way, only affecting the smoothed three-field-frame picture. Note 
that this is effectively equivalent to using a different proportion of the three field-frames, 
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rather than the original three-field-frame weights of [0.25, 0.5, 0.25]. Computing 2/3 of [0.25, 
0.5, 0.25] plus 1/3 of (0,1,0), yields [0.1667, 0.6666, 0.1667] as the temporal filter for the 
three field-frames. The more heavily weighted center (current) field-frame brings additional 
immediacy to the result, even in the smoothed areas which fell below the threshold value. 
This combination has proven effective in balancing temporal smoothness with immediacy in 
the de-interlacing process for moving parts of a scene. 

Use of Linear Filters 

[0046] Sums, filters, or matrices involving video pictures should take into account the 
fact that pixel values in video are non-linear signals. For example, the video curve for HDTV 
can be several variations of coefficients and factors, but a typical formula is the international 
CCIR XA-1 1 (now called Rec. 709): 

[0047] V = 1.0993 * L^ "^^ - 0.0993 for L> 0.018051 
[0048] V = 4.5 *L for L<= 0.018051 

[0049] where V is the video value and L is linear light limiinance. 
[0050] The variations adjust the threshold (0.018051) a little, the factor (4.5) a little 
(e.g. 4.0), and the exponent (0.45) a little {e.g., 0.4). The fundamental formula, however, 
remains the same. 

[0051] A matrix operation, such as a RGB to/from YUV conversion, implies linear 
values. The fact that MPEG in general uses the video non-linear values as if they were Unear 
results in leakage between the luminance (Y) and the color values (U, and V). This leakage 
interferes with compression efficiency. The use of a logarithmic representation, such as is 
used with fihn density units, corrects much of this problem. The various types of MPEG 
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encoding are neutral to the non-linear aspects of the signal, although its efficiency is effected 
due to the use of the matrix conversion RGB to/from YUV. YUV (U = R-Y, V = B-Y) 
should have Y computed as a linearized sum of 0.59 G, plus 0.29 R, plus 0.12 B (or slight 
variations on these coefiBcients). However, U (= R-Y) becomes equivalent to RA^ in 
logarithmic space, which is orthogonal to luminance. Thus, a shaded orange ball will not vary 
the U (= R-Y) parameter in a logarithmic representation. The brightness variation will be 
represented completely in the Luminance parameter, where full detail is provided. 
[0052] The linear vs. logarithmic vs. video issue impacts filtering. A key point to note 
is that small signal excursions {e.g. 10% or less) are approximately correct when a non-linear 
video signal is processed as if it were a linear signal. This is because a piece-wise linear 
approximation to the smooth video-to-from-linear conversion curve is reasonable. However, 
for large excursions, a linear filter is much more effective, and produces much better image 
quality. Accordingly, if large excursions are to be optimally coded, transformed, or otherwise 
processed, it would be desirable to first convert the non-linear signal to a linear one in order 
to be able to apply a linear filter. 

[0053] De-interlacmg is therefore much better when each filter and summation step 
utilizes conversions to linear values prior to filtering or summing. This is due to the large 
signal excursions inherent in interlaced signals at small details of the image. After fiUering, 
the image signals are converted back to the non-linear video digital representation. Thus, the 
three-field-fi-ame weighting {e.g., [0.25, 0.5, 0.25] or [0.1667, 0.6666, 0.1667]) should be 
performed on a linearized video signal. Other filtering and weighted sums of partial terms in 
noise and de-interlace filtering should also be converted to linear form for computation. 
Which operations warrant linear processing is determined by signal excursion, and the type of 
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filtering. Image sharpening can be appropriately computed in video or logarithmic non-linear 
representations, since it is self-proportional. However, matrix processing, spatial filtering, 
weighted sums, and de-interlace processing should be computed using linearized digital 
values. 

[0054] As a simple example, the single field- frame de-interlacer described above 
computes missing alternate lines by averaging the line above and below each actual line. This 
average is much more correct numerically and visually if this average is done linearly. Thus, 
instead of summing 0.5 times the line above plus 0.5 times the line below, the digital values 
are linearized first, then averaged, and then reconverted back into the non-linear video 
representation. 

Median Filters 

[0055] In noise processing, the most useful filter is the median filter. A three element 
median filter just ranks the three entries, via a simple sort, and picks the middle one. For 
example, an X (horizontal) median filter looks at the red value (or green or blue) of three 
adjacent horizontal pixels, and picks the one with the middle-most value. If two are the same, 
that value is selected. Similarly, a Y (vertical) filter looks in the scanlines above and below 
the current pixel, and again picks the middle value. 

[0056] It has been experimentally determined that it is useful to average the results 
fi'om applying both an X and a Y median filter to create a new noise-reducing component 
picture (z.e., each new pixel is the 50% equal average of the X and Y medians for the 
corresponding pixel fi'om a source image). 
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[0057] In addition to X and Y (horizontal and vertical) medians, it is also possible to 
take diagonal and other medians. However, the vertical and horizontal pixel values are most 
close physically to any particular pixel, and therefore produce less potential error or distortion 
than the diagonals. However, such other medians remain available in cases where noise 
reduction is difficult using only the vertical and horizontal medians. 

[0058] Another beneficial source of noise reduction is information from the previous 
and subsequent frame (/.e., a temporal median). As mentioned below, motion analysis 
provides the best match for moving regions. However, it is compute intensive. If a region of 
the image is not moving, or is moving slowly, the red values (and green and blue) from a 
current pixel can be median filtered with the red value at that same pixel location in the 
previous and subsequent frames. However, odd artifacts may occur if significant motion is 
present and such a temporal filter is used. Thus, it is preferred that a threshold be taken first, 
to determine whether such a median would differ more than a selected amount from the value 
of a current pixel. The threshold can be computed essentially the same as for the de- 
interlacing threshold above: 

[0059] Rdiff=R_current_pixel minus Rjemporal__median 
[0060] Gdiff = G_current_pixel minus G_temporal_median 
[0061] Bdiff =B_current_pixel minus B_temporal_median 
[0062] ThresholdingValue = abs(Rdiff+Gdiff+Bdiff) + abs(Rdiff) + abs(Gdiff)+ 
abs(Bdiff) 

[0063] The ThresholdingValue is then compared to a threshold setting. Typical 
threshold settings are in the range 0.1 to 0.3, with 0.2 being typical. Above the threshold, the 
current value is kept. Below the threshold, the temporal median is used. The block diagram of 
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FIG. 3 also applies to this threshold test, hi this case the PROCESSING block 30 is a 
temporal median filter and the inputs are three successive frames. The SELECTION 
CONTROL block 32 compares the output 36 of the PROCESSING block 30 with hiput B 34 
using the above equations for Rdifif, Gdiff, Bdiff, and ThresholdingValue. The switch selects 
the PROCESSING output 36 if the ThresholdingValue is less than the threshold, otherwise 
the switch selects Input B 34, the middle value, for the output 38. 

[0064] An additional median type is a median taken between the X, Y, and temporal 
medians. Another median type can take the temporal median, and then take the equal average 
of the X and Y medians from it. 

[0065] Each type of median can cause problems. X and Y medians smear and blur an 
image, so that it looks "greasy". Temporal medians cause smearing of motion over time. 
Since each median can result in problems, yet each median's properties are different (and, in 
some sense, "orthogonal"), it has been determined experimentally that the best results come 
by combining a variety of medians. 

[0066] In particular, FIG. 4 shows a preferred combination of medians is a linear 
weighted sum (see the discussion above on linear video processing) of five terms to 
determine the value for each pixel of a current image: 

[0067] 50% of the original image (Frame N 40) (thus, the most noise reduction is 
3db, or half); 

[0068] 1 5% of the average of X and Y medians 42, 44, respectively; 
[0069] 1 0% of the thresholded temporal median 46; 

[0070] 1 0% of the average of X and Y medians of the thresholded temporal median 
(48); and 
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[0071] 1 5% of a three-way X, Y, and temporal median (50). 
[0072] This set of time medians does a reasonable job of reducing the noise in the 
image without making it appear "greasy" or blurred, causing temporal smearing of moving 
objects, or losing detail. Another useful weighting of these five terms is 35%, 20%, 22.5%, 
10%, and 12.5%, respectively. 

[0073] In addition, it is useful to apply motion-compensation by applying center 
weighted temporal filters to a motion-compensated nxn region, as described below. This can 
be added to the median filtered image resuh (of five terms, just described) to further smooth 
the image, providing better smoothing and detail on moving image regions. 

Motion Analysis 

[0074] In addition to "in-place" temporal filtering, which does a good job at 
smoothing slow-moving details, de-interlacing and noise reduction can also be improved by 
use of motion analysis. Adding the pixels at the same location in three fields or three fi-ames 
is valid for stationary objects. However, for moving objects, if temporal averaging/smoothing 
is desired, it is often more optimal to attempt to analyze prevailing motion over a small group 
of pixels. For example, an nxn block of pixels (e.g., 2x2, 3x3, 4x4, 6x6, or 8x8) can be used 
to search in previous and subsequent fields or fi-ames to attempt to find a match (in the same 
way MPEG-2 motion vectors are found by matching 16x16 macroblocks). Once a best match 
is found in one or more previous and subsequent frames, a 'trajectory" and "moving mini- 
picture" can be determined. For interlaced fields, it is best to analyze comparisons as well as . 
compute inferred moving mini-pictures utiUzing the results of the thresholded de-interlaced 
process above. Since this process has already separated the fast-moving from the slow- 
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moving details, and has already smoothed the slow moving details, the picture comparisons 
and reconstructions are more applicable than individual de-interlaced fields. 
[0075] The motion analysis preferably is performed by comparison of an nxn block in 
the current thresholded de-interlaced image with all nearby blocks in the previous and 
subsequent one or more frames. The comparison maybe the absolute value of differences in 
luminance or RGB over the nxn block. One frame is sufficient forward and backward if the 
motion vectors are nearly equal and opposite. However, if the motion vectors are not nearly 
equal and opposite, then an additional one or two frames forward and backward can help 
determine the actual trajectory. Further, different de-interlacing treatments may be useful in 
helping determine the "best guess" motion vectors going forward and back. One de- 
interlacing treatment can be to use only individual de-interlaced fields, although this is 
heavily prone to aliasing and artifacts on small moving details. Another de-interlacing 
technique is to use only the three-field-frame smooth de-interlacing, without thresholding, 
having weightings [0.25, 0.5, 0.25], as described above. Although details are smoothed and 
sometimes lost, the trajectory may often be more correct. 

[0076] Once a trajectory is found, a "smoothed nxn block" can be created by 
temporally filtering using the motion-vector-ofifset pixels from the one (or more) previous 
and subsequent frames. A typical filter might again be [0.25, 0.5, 0.25] or [0.1667, 0.6666, 
0.1667] for three frames, and possibly [0.1, 0.2, 0.4, 0.2, 0.1] for two frames back and 
forward. Other filters, with less central weight, are also usefiil, especially with smaller block 
sizes (such as 2x2, 3x3, and 4x4). Rehability of the match between frames is indicated by the 
absolute difference value. Large minimum absolute differences can be used to select more 
center weight in the filter. Lower values of absolute differences can suggest a good match, 

19 



Attorney Docket No.: 07314-005002 

and can be used to select less center weight to more evenly distribute the average over a span 
of several frames of motion-compensated blocks. 

[0077] These filter weights can be applied to: individual de-interlaced motion- 
compensated field-frames; thresholded three-field-frame de-interlaced pictures, described 
above; and non-thresholded three-field-frame de-interlaced images, with a [0.25, 0.5, 0.25] 
weighting, also as described above. However, the best filter weights usually come from 
applying the motion-compensated block linear filtering to the thresholded three-field-frame 
result described above. This is because the thresholded three-field-frame image is both the 
smoothest (in terms of removing aliasing in smooth areas), as well as the most motion- 
responsive (in terms of defaulting to a single de-interlaced field-frame above the threshold). 
Thus, the motion vectors from motion analysis can be used as the inputs to multi-frame or 
multi-de-interlaced-field-frame or single-de-interlaced field-frame filters, or combinations 
thereof The thresholded multi-field-frame de-interlaced images, however, form the best filter 
input in most cases. 

[0078] The use of motion analysis is computationally expensive for a large search 
region, when fast motion might be found (such as ±32 pixels). Accordingly, it may be best to 
augment the speed by using special-purpose hardware or a digital signal processor assisted 
computer. 

[0079] Once motion vectors are found, together with their absolute difference 
measure of accuracy, they can be utilized for the complex process of attempting frame rate 
conversion. However, occlusion issues (objects obscuring or revealing others) will confound 
matches, and cannot be accurately inferred automatically. Occlusion can also involve 
temporal aliasing, as can normal image temporal undersampling and its beat with natural 
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image frequencies (such as the "backward wagon wheel" effect in movies). These problems 
often cannot be unraveled by any known computation technique, and to date require human 
assistance. Thus, human scrutiny and adjustment, when real-time automatic processing is not 
required, can be used for off-line and non-real-time frame-rate conversion and other similar 
temporal processes. 

[0080] De-interlacing is a simple form of the same problem. Just as with frame-rate- 
conversion, the task of de-interlacing is theoretically impossible to perform perfectly. This is 
especially due to the temporal undersampling (closed shutter), and an inappropriate temporal 
sample filter {i.e., a box filter). However, even with correct samples, issues such as occlusion 
and interlace aliasing further ensure the theoretical impossibility of correct results. The cases 
where this is visible are mitigated by the depth of the tools, as described here, which are 
applied to the problem. Pathological cases will always exist in real image sequences. The 
goal can only be to reduce the frequency and level of impairment when these sequences are 
encountered. However, in many cases, the de-interlacing process can be acceptably fiilly 
automated, and can run unassisted in real-time. Even so, there are many parameters which 
can often benefit from manual adjustment. 

Filter Smoothing of High Frequencies 

[0081] In addition to median filtering, reducing high frequency detail will also reduce 
high frequency noise. However, this smoothing comes at the price of loss of sharpness and 
detail. Thus, only a small amount of such smoothing is generally useful. A fiher which 
creates smoothing can be easily made, as with the threshold for de-interlacing, by down- 
filtering with a normal fiher {e.g., truncated sine filter) and then up-filtering with a gaussian 
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filter. The result will be smoothed because it is devoid of high frequency picture detail. When 
such a term is added, it typically must be in very small amounts, such as 5% to 10%, in order 
to provide a small amoimt of noise reduction. In larger amounts, the blurring effect generally 
becomes quite visible. 

Base Layer Noise Filtering 

[0082] The filter parameters for the median filtering described above for an original 
image should be matched to the noise characteristics of the film grain or image sensor that 
captured the image. After this median filtered image is down-filtered to generate an input to 
the base layer compression process, it still contains a small amount of noise. This noise may 
be fiirther reduced by a combination of another X-Y median filters (equally averaging the X 
and Y medians), plus a very small amount of the high firequency smoothing filter. A preferred 
filter weighting of these three terms, applied to each pixel of the base layer, is: 

[0083] 75% of the original base layer (down filtered fi-om median-fihered original 
above); 

[0084] 22.5% of the average of X and Y medians; and 
[0085] 7.5% of the down-up smoothing filter. 

[0086] This small amount of additional filtering in the base layer provides a small 
additional amount of noise reduction and improved stability, resulting in better MPEG 
encoding and limiting the amount of noise added by such encoding. 

COMPUTER IMPLEMENTATION 
[0087] The invention may be implemented in hardware or software, or a combination 

of both. However, preferably, the invention is implemented in computer programs executing 



22 



Attorney Docket No.: 07314-005002 

on one or more programmable computers each comprising at least a processor, a data storage 
system (including volatile and non-volatile memory and/or storage elements), an input 
device, and an output device. Program code is applied to input data to perform the functions 
described herein and generate output information. The output information is applied to one or 
more output devices, in known fashion. 

[0088] Each such program may be implemented in any desired computer language 
(including machine, assembly, or high level procedural, logical, or object oriented 
programming languages) to communicate with a computer system. In any case, the language 
may be a compiled or interpreted language. 

[0089] Each such computer program is preferably stored on a storage media or device 
(e.g., ROM, CD-ROM, or magnetic or optical media) readable by a general or special 
purpose programmable computer system, for configuring and operating the computer when 
the storage media or device is read by the computer system to perform the procedures 
described herein. The inventive system may also be considered to be implemented as a 
computer-readable storage medium, configured with a computer program, where the storage 
medium so configured causes a computer system to operate in a specific and predefined 
manner to perform the functions described herein. 

[0090] A number of embodiments of the invention have been described. Nevertheless, 
it will be understood that various modifications may be made without departing from the 
spirit and scope of the invention. For example, while the preferred embodiment uses 
MPEG-2 or MPEG-4 coding and decoding, the invention will work with any comparable 
standard that provides equivalents of I, P, and/or B fi-ames and layers. Accordingly, it is to be 
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understood that the invention is not to be limited by the specific illustrated embodiment, but 
only by the scope of the appended claims. 
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